智能论文笔记

Prior-mean-assisted Bayesian optimization application on FRIB Front-End tunning

Kilean Hwang , Tomofumi Maruta , Alexander Plastun , Kei Fukushima , Tong Zhang , Qiang Zhao , Peter Ostroumov , Yue Hao

分类：机器学习

2022-11-11

Bayesian optimization~(BO) is often used for accelerator tuning due to its high sample efficiency. However, the computational scalability of training over large data-set can be problematic and the adoption of historical data in a computationally efficient way is not trivial. Here, we exploit a neural network model trained over historical data as a prior mean of BO for FRIB Front-End tuning.

translated by 谷歌翻译

Learnable Filters for Geometric Scattering Modules

Alexander Tong , Frederik Wenkel , Dhananjay Bhaskar , Kincaid Macdonald , Jackson Grady , Michael Perlmutter , Smita Krishnaswamy , Guy Wolf

分类：机器学习

2022-08-15

我们提出了一个新的图神经网络（GNN）模块，该模块基于最近提出的几何散射变换的松弛，该变换由图形小波滤波器组成。我们可学习的几何散射（腿）模块可以使小波的自适应调整能够鼓励乐队通道特征在学习的表示中出现。与许多流行的GNN相比，我们的腿部模块在GNN中的结合能够学习长期图形关系，这些GNN通常依赖于邻居之间的平滑度或相似性来编码图形结构。此外，与竞争性GNN相比，其小波先验会导致简化的架构，学到的参数明显少得多。我们证明了基于腿的网络在图形分类基准上的预测性能，以及在生化图数据探索任务中学到的功能的描述性质量。我们的结果表明，基于腿部的网络匹配或匹配流行的GNN，以及在许多数据集上，尤其是在生化域中的原始几何散射结构，同时保留了手工制作的（非学习）几何散射的某些数学特性。

translated by 谷歌翻译

Manifold Interpolating Optimal-Transport Flows for Trajectory Inference

Guillaume Huguet , D. S. Magruder , Oluwadamilola Fasina , Alexander Tong , Manik Kuchroo , Guy Wolf , Smita Krishnaswamy

分类：机器学习

2022-06-29

在这里，我们提出了一种称为歧管插值最佳传输流量（MIOFLOW）的方法，该方法从零星时间点上采集的静态快照样品中学习随机，连续的种群动力学。 Mioflow结合了动态模型，流动学习和通过训练神经普通微分方程（神经ode）的最佳运输，以在静态种群快照之间插值，以通过具有歧管地面距离的最佳运输来惩罚。此外，我们通过在自动编码器的潜在空间中运行我们称为Geodesic AutoCododer（GAE）来确保流量遵循几何形状。在GAE中，正规化了点之间的潜在空间距离，以匹配我们定义的数据歧管上的新型多尺度测量距离。我们表明，这种方法优于正常流，Schr \“ Odinger Bridges和其他旨在根据人群之间插值的噪声流向数据的生成模型。从理论上讲，我们将这些轨迹与动态最佳运输联系起来。我们评估了我们的评估使用分叉和合并的模拟数据，以及来自胚胎身体分化和急性髓样白血病的SCRNA-SEQ数据。

translated by 谷歌翻译

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Aarohi Srivastava , Abhinav Rastogi , Abhishek Rao , Abu Awal Md Shoeb , Abubakar Abid , Adam Fisch , Adam R. Brown , Adam Santoro , Aditya Gupta , Adrià Garriga-Alonso

分类：自然语言处理 | 人工智能 | 机器学习 | (统计)机器学习

2022-06-09

语言模型既展示了定量的改进，又展示了新的定性功能，随着规模的增加。尽管它们具有潜在的变革性影响，但这些新能力的特征却很差。为了为未来的研究提供信息，为破坏性的新模型能力做准备，并改善社会有害的效果，至关重要的是，我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战，我们介绍了超越模仿游戏基准（Big Bench）。 Big Bench目前由204个任务组成，由132家机构的442位作者贡献。任务主题是多样的，从语言学，儿童发展，数学，常识性推理，生物学，物理学，社会偏见，软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号，Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为，跨越了数百万到数十亿个参数。此外，一个人类专家评估者团队执行了所有任务，以提供强大的基准。研究结果包括：模型性能和校准都随规模改善，但绝对的术语（以及与评估者的性能相比）；在模型类中的性能非常相似，尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分，而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标；社交偏见通常会随着含糊不清的环境而随着规模而增加，但这可以通过提示来改善。

translated by 谷歌翻译

CholecTriplet2021: A benchmark challenge for surgical action triplet recognition

Chinedu Innocent Nwoye , Deepak Alapatt , Tong Yu , Armine Vardazaryan , Fangfang Xia , Zixuan Zhao , Tong Xia , Fucang Jia , Yuxuan Yang , Hao Wang

分类：计算机视觉

2022-04-10

Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of <instrument, verb, target> combination delivers comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms by competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery.

translated by 谷歌翻译

MURAL: An Unsupervised Random Forest-Based Embedding for Electronic Health Record Data

Michal Gerasimiuk , Dennis Shung , Alexander Tong , Adrian Stanley , Michael Schultz , Jeffrey Ngu , Loren Laine , Guy Wolf , Smita Krishnaswamy

分类：机器学习 | 人工智能

2021-11-19

嵌入或可视化临床患者数据的主要挑战是可变类型的异质性，包括连续实验室值，分类诊断代码以及缺失或不完整的数据。特别地，在EHR数据中，一些变量是{\ EM缺失而不是随机（MNAR）}但故意没有收集，因此是信息来源。例如，在疑似诊断的基础上，某些患者可能认为实验室测试是必要的，但不适用于其他患者。在这里，我们呈现壁画林 - 一个无监督的随机森林，用于代表具有不同变量类型的数据（例如，分类，连续，mnar）。壁画森林由一组决策树组成，其中随机选择节点分裂变量，使得所有其他变量的边缘熵由分裂最小化。这允许我们在与连续变量一致的方式中也拆分在Mnar变量和离散变量上。最终目标是学习使用这些患者之间的平均树距离的患者的壁画嵌入。这些距离可以馈送到非线性维度减少方法，如phate，以获得可视化的嵌入。虽然这种方法在连续值的数据集中普遍存在（如单细胞RNA测序）中，但它们尚未在混合可变数据中广泛使用。我们展示在一个人工和两个临床数据集上使用我们的方法。我们表明，使用我们的方法，我们可以比竞争方法更准确地对数据进行可视化和分类数据。最后，我们表明壁画也可用于通过最近提出的树木切片的Wassersein距离比较患者的群组。

translated by 谷歌翻译

Benchmarking common uncertainty estimation methods with histopathological images under domain shift and label noise

Hendrik A. Mehrtens , Alexander Kurz , Tabea-Clara Bucher , Titus J. Brinker

分类：计算机视觉 | 机器学习

2023-01-03

In the past years, deep learning has seen an increase of usage in the domain of histopathological applications. However, while these approaches have shown great potential, in high-risk environments deep learning models need to be able to judge their own uncertainty and be able to reject inputs when there is a significant chance of misclassification. In this work, we conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole-Slide-Images under domain shift using the H\&E stained Camelyon17 breast cancer dataset. Although it is known that histopathological data can be subject to strong domain shift and label noise, to our knowledge this is the first work that compares the most common methods for uncertainty estimation under these aspects. In our experiments, we compare Stochastic Variational Inference, Monte-Carlo Dropout, Deep Ensembles, Test-Time Data Augmentation as well as combinations thereof. We observe that ensembles of methods generally lead to higher accuracies and better calibration and that Test-Time Data Augmentation can be a promising alternative when choosing an appropriate set of augmentations. Across methods, a rejection of the most uncertain tiles leads to a significant increase in classification accuracy on both in-distribution as well as out-of-distribution data. Furthermore, we conduct experiments comparing these methods under varying conditions of label noise. We observe that the border regions of the Camelyon17 dataset are subject to label noise and evaluate the robustness of the included methods against different noise levels. Lastly, we publish our code framework to facilitate further research on uncertainty estimation on histopathological data.

translated by 谷歌翻译

More is Better: A Database for Spontaneous Micro-Expression with High Frame Rates

Sirui Zhao , Huaying Tang , Xinglong Mao , Shifeng Liu , Hanqing Tao , Hao Wang , Tong Xu , Enhong Chen

分类：计算机视觉

2023-01-03

As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.

translated by 谷歌翻译

PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation

Xiangtai Li , Shilin Xu , Yibo Yang , Haobo Yuan , Guangliang Cheng , Yunhai Tong , Zhouchen Lin , Dacheng Tao

分类：计算机视觉

2023-01-03

Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70% on GFlops and 50% on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.

translated by 谷歌翻译

Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation

Jianzong Wu , Xiangtai Li , Henghui Ding , Xia Li , Guangliang Cheng , Yunhai Tong , Chen Change Loy

分类：计算机视觉

2023-01-02

In this work, we focus on instance-level open vocabulary segmentation, intending to expand a segmenter for instance-wise novel categories without mask annotations. We investigate a simple yet effective framework with the help of image captions, focusing on exploiting thousands of object nouns in captions to discover instances of novel classes. Rather than adopting pretrained caption models or using massive caption datasets with complex pipelines, we propose an end-to-end solution from two aspects: caption grounding and caption generation. In particular, we devise a joint Caption Grounding and Generation (CGG) framework based on a Mask Transformer baseline. The framework has a novel grounding loss that performs explicit and implicit multi-modal feature alignments. We further design a lightweight caption generation head to allow for additional caption supervision. We find that grounding and generation complement each other, significantly enhancing the segmentation performance for novel categories. We conduct extensive experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS). The results demonstrate the superiority of our CGG framework over previous OVIS methods, achieving a large improvement of 6.8% mAP on novel classes without extra caption data. Our method also achieves over 15% PQ improvements for novel classes on the OSPS benchmark under various settings.

translated by 谷歌翻译